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Abstract 

This paper deals with the definition and optimization of augmenta- 
tion spaces for faster convergence of the conjugate gradient method in the 
resolution of sequences of linear systems. Using advanced convergence 
results from the literature, we present a procedure based on a selection of 
relevant approximations of the eigenspaces for extracting, selecting and 
reusing information from the Krylov subspaces generated by previous so- 
lutions in order to accelerate the current iteration. Assessments of the 
method are proposed in the cases of both linear and nonlinear structural 
problems. 
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1 Introduction 

Accelerating the convergence of Krylov iterative solvers [44] is an old issue which 
has returned to the spotlight because of the increasing number of applications 
for which these are preferred to direct solvers today. Traditional approaches 
aim at improving the condition number by using frameworks in which efficient 
preconditioners exist (e.g. domain decomposition methods [25] [17]), or for which 
good initialization vectors [19 , relevant augmentation subspaces [6] [43] [5] or 
suitable block strategies (see [lj for a very general block-Lanczos algorithm) are 
available. For instance: 

• For 3D elasticity problems, domain decomposition methods come with 
"physical" augmentation associated with the global equilibrium of floating 
substructures (rigid body motions), which makes the methods scalable 
[T3] [30] ; for plate and shell problems, additional augmentation through 
"corner modes" [TTJ [9j [26] is required. 



• For structures with repeated patterns, block strategies are possible [20] . 



• For restarted algorithms, one can use deflation or augmentation [33, 7 , or 
block techniques [3 . 

• For problems with multiple right-hand sides, deflation [10] HUE] is a rather 
classical approach. 

The problem with these techniques is that they require some a priori informa- 
tion which is seldom available, except in specific cases. 

Many recent works present theoretical and practical comparisons of the nu- 
merous algorithms which have been developed in connection with these ideas 

im sz]. 

Multiresolution approaches form a general framework in which numerical 
information is available to accelerate the convergence of Krylov solvers. Mul- 
tiresolution refers to situations in which the solution of a mechanical problem 
cannot be achieved through the resolution of a single linear system. For ex- 
ample, calculating the solution of a nonlinear or time-dependent problem or 
exploring a design of experiment during an optimization procedure requires the 
resolution of sequences of linear systems. Multiresolution is more general than 
using multiple right-hand sides because the matrices themselves are likely to 
change from one system to another (multiple right-hand and left-hand sides). 
Thus, the problem consists in solving a /c-indexed family of large, sparse, linear 
n x n systems of the form: 

A (k) x ( k) = b (k) (1) 

Although different, the systems are assumed to be similar to one another. This 
similarity can be defined in several ways: in terms of rank, by the fact that 
rank(A^) - A (/c_1) ) <C rank(A^)); or in a spectral sense by the fact that 
the eigenspaces remain stable from one system to another; or in terms of the 
Krylov subspaces generated [4]. The first case can be dealt with easily, even 
with direct solvers, by using the Sherman-Morrison formula; the second case 
requires augmentation strategies in order to eliminate the most penalizing part 
of the spectrum and improve the active condition numbeiQ [39] [16j [38] [50] ; and 
the last case calls for preconditioning techniques [40, 41 . 

While most of the studies of Krylov methods for multiresolution (often re- 
ferred to as the recycling of Krylov subspaces) are set in the framework of 
GMRes/MinRes [33 [50], we chose to work on the specific case of the resolu- 
tion of symmetric, positive definite systems using conjugate gradients (CGs), in 
which the convergence is under control and related to easily calculated spectral 
properties [49:. In earlier works, the authors developed efficient preconditioners 
based on previous Krylov subspaces [40[ [41] which took advantage of the con- 
jugation properties of CGs, but did not extract the most interesting part of the 
information available in the Krylov subspaces, and they proposed augmentation 
techniques using Ritz vectors [39] . Typically, these works were aimed at nonlin- 
ear mechanical systems solved by Newton-Raphson linearization and FETI or 
BDD domain decomposition p~8j [27] . 

1 "active" referring to the part of the spectrum of the matrix which is solicited by the 
right-hand side. 
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The recycling of Krylov subspaces can also be analyzed from the model re- 
duction point of view. Since Krylov solvers satisfy Petrov-Galerkin conditions, 
they share many common points with strategies based on Karhunen-Loeve ex- 
pansion [31j S21 ES]. These similarities are well-known p~5j [14j [21]. But our 
objective is not to develop reduced models of mechanical systems in order to 
perform fast but coarse analyzes; it is to define, improve and reuse reduced 
models in order to carry out calculations both rapidly and accurately. 

In this paper, we undertake a more in-depth investigation of augmentation 
using a selection of post-processed Ritz vectors. In Section 2, we begin with a 
detailed presentation of the theoretical framework of the augmented precondi- 
tioned conjugate gradient method; then, in Section 3, we propose a first reuse 
algorithm in a multiresolution framework (TRKS); in Section 4, we improve this 
algorithm by proposing a procedure for selecting the "best" Ritz vectors (SRKS 
and "cluster"); finally, in Section 5, we propose an evaluation of the method in 
the case of nonlinear mechanics and parametric problems, using domain decom- 
position methods [17 to define efficient preconditioners. 



2 The augmented preconditioned conjugate gra- 
dient method 

2.1 Algorithm and properties 

Let us consider the linear problem 

Ax = 6, (2) 

where A is an n x n symmetric positive definite matrix, and let us study the 
resolution of this system using the augmented preconditioned conjugate gradient 
algorithm. With M being the n x n symmetric positive definite matrix of the 
preconditioner, we introduce the following notations: 

i = . . . m the iteration number 

Xi the i th approximation (3) 

n = b — Axi = A(x — Xi) the i th residual 

With no loss of generality, the presentation can be limited to the case of a zero 
initial guess xoo = 0. (Otherwise, one can set b <— b — Axoo.) 

Let C be a subspace of R n of dimension n c , and let Matrix C = [ci, . . . , c n J 
be a basis of C. The search principle of the augmented left-preconditioned 
conjugate gradient is: 

/ find ^e/CiCM-^CM-Vo) m 

\ such that n _L /Q(M _1 A, C, M" 1 ^) l j 

where /Q(M" _1 A, C, M~ 1 tq) is the augmented Krylov subspace associated with 
preconditioned operator M _1 A and augmentation subspace C: 

Ki{M~ x A, C, M _1 r ) = span [m~ x t^ . . . , (M~ x A^-^M" 1 ^) ©C (5) 
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A classical implementation relies on the definition of a convenient initializa- 
tion and projector pair (xq,P): 

x-x^ + fy | c T AP = <= P = I-C(C T AC)- 1 C T A W 

One should note that since AP = P T AP = P T A augmentation preserves 
symmetry. One should also note that PC = 0. The system to be solved is: 

APy = (P T AP)y = r = P T b (7) 

The C-augmented, iVf -preconditioned conjugate gradient technique (APCG) 
implemented by projection is presented in Algorithm [l] (For the sake of sim- 
plicity, the methods will be described assuming exact arithmetic, even though 
they are compatible with more realistic full reorthogonalization [28].) 



Algorithm 1: APCG(A, M,C, b) 

Calculate AC, (C T AC)' 1 ; (P = I - C (C T AC) _1 C T A); 
x = C(C T AC)- 1 C T 6; 
r = b — Axq = P T b; 
z = PM _1 r , w = z ] 
for j = 1 , . . . , m do 

otj-i = (r j - U w j -i)/(Aw j - U w j -i) 

Xj = Xj-i + a j _ i Wj _ i 

Tj = Vj-i — aj-iAwj-i 

Zj PM~ l r j+1 

W 3 = Z 3 ~ Pj W J-l 

fa = (Awj-i,Zj)/(wj-i,Awj-i) 
end 



The following basic relations hold: 

(ri,^)=0, i + 3 ^ 
(wi,Awj)=0, i^j 

With Wi = [wq, . . . , Wi-i] and = [zq, • • • , ^i-i] being two bases of /Q(PM" _1 A, zq), 
the projector enables the spaces to be divided orthogonally: 

/Q(M _1 A, C, M-V ) = JCiiPM' 1 A, * ) "© C (9) 

Of course, in the absence of optional constraints (C = 0, P = J), APCG 
reduces to standard preconditioned conjugate gradients PCG(A,Ai", 6); if, in 
addition, M~ l = I, it becomes a standard conjugate gradient algorithm CG(A, 
b). 

Let us recall a first result which was proven in [6 for the case of non- 
preconditioned augmented conjugate gradients. 
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Proposition 1. Let V = Range(P) 

• APCG(A,I,C, b) is equivalent to CG(P T AP\ AV , P T b) in the sense that both 
generate the same residuals, xi, the i th APCG approximation, is connected to 
y if the i th CG approximation, by Xi Xq 

• APCG(A, I, C, b) does not break down; it converges, and its asymptotic con- 
vergence rate is governed by the condition number k [P t AP\av) ^ « (A). 

Consequently, augmentation strategies never decrease the asymptotic con- 
vergence rate. The following corollary is straightforward: 

Corollary 1. Let D = [di, . . . , d md ] be a set of linearly independent vec- 
tors such that E = [C, D] is a full column rank matrix. Let Pe = I — 
E{E T AE)~ 1 E T A, and let V E be the range of P E . 

• APCG(A,I,E, b) is equivalent to APCG(P T AP, I, D, P T b) in the sense 
that both generate the same residual, xf, the i th approximation of APCG (A, J, E, 
b), is connected to xf*, the i th approximation of APCG(P T AP , I, D, P T b), 
by xf = Xq + Pxf . 

• The asymptotic convergence rate is governed by k ^Pe T ' APe\aVe^ ^ K {P T AP\av) ^ 

K{A). 

In conclusion, an increase in the size of the augmentation can only improve 
the asymptotic rate of convergence. (In the worst case, it leaves it unchanged.) 

Now let us focus on the effect of preconditioning. Since M is a symmetric 
positive definite matrix, it can be factorized in Cholesky form M = LL T (where 
L denotes a lower triangular matrix with positive diagonal coefficients). Let us 
introduce the notation: 

A = L 1 AL T ; b = L- x b ; x = L T x ( , 

C = L T C ; P = I-C(C T AC)- 1 C T A [ j 

Then, the following equivalence between preconditioned and non-preconditioned 
augmented conjugate gradients holds: 

Proposition 2. APCG(A,M,C, b) is equivalent to APCG(A, I, C, b) with 
r = L~ x r — z — L T z, w — L T w, a — a and f3 — f3. Its asymptotic convergence 
rate is governed by k (^P t AP^ 

Proof Since P = L T PL T , we obtain directly x = C(C T AC)- 1 C T b = 
L T xo, t'q = b — Axo = L~ 1 ro — z$ = L T zo and wo = L t wq. By induction, 
it follows that <%-i = (rj-i,Zj-\)/(Awj-i,Wj-i) = fj = L~ x r^ f3j = 

(Avjj-i, Zj) I (Avjj-i, Wj-i) = fij and Wj = L T wj. Proposition [l] provides the 
inequality concerning the asymptotic convergence rate. □ 

Putting these propositions together, APCG(A, iVf, C, b) is equivalent to 
CG(L _1 P T APL _T , L~ 1 P T b). All these results lead us to propose an efficient 
augmentation by analogy with an equivalent, simpler system solved by classical 
conjugate gradients. 
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2.2 Interpretation and choice of the augmentation 

From a "constraint" point of view, the projection guarantees the C-orthogonality 
of the residual throughout the iterations (C T Tj = 0). For example, in the FETI 
domain decomposition method, the residual is the displacement jump between 
the subdomains; in the case of shell and plate problems, matrix C is intro- 
duced to enforce the continuity of the displacement at the corner points [12] . In 
the BDD domain decomposition method, matrix C is associated with the rigid 
body motions of floating substructures and, therefore, local Neumann problems 
in the preconditioner are always well-posed [29]; for shell and plate problems, 
the matrix is enriched by corner mode corrections [26 . In both cases, matrix 
(C T AC) _1 , called a coarse grid matrix, plays a crucial role in the scalability 
of these methods. 

From a "spectral" point of view, augmentation can be used to decrease 
the active condition number ("active" referring to eigenelements solicited by 
the right-hand side) and, thus, improve the asymptotic convergence rate. This 
is called a deflation strategy [HI [5], which boils down to building matrix C 
by using (approximate) eigenvectors associated with the lowest eigenvalues. 
Obviously, when C consists of the n c eigenvectors associated with the low- 
est eigenvalues (Ai ^ ... ^ A nc ^ ... ^ A n ), the condition number decreases 
strictly: k (P t AP av ) = < ^. 

From a "model reduction" point of view, subspace C represents a "macro" 
(or coarse) space in which the macro part of the solution is calculated directly 
during the initialization while the "micro" part of the solution, when required, 
is obtained during the iterations. 

2.3 Estimation of computation costs 

With regard to the numerical cost of augmentation, the main operations for 
the construction of the projector are: (i) the block product AC (and assembly 
with neighbors for domain decomposition methods), (ii) the block dot-product 
(C T AC) (plus an all-to-all sum for domain decomposition methods), and (hi) 
the factorization of the fully-populated coarse matrix (C T AC). Then, the ap- 
plication of the projector consists simply of (i) one block dot-product ((AC) T x) 
(plus an all-to-all exchange), (ii) the resolution of the coarse problem, and (hi) 
the matrix- vector product (Ca). 

Thus, provided that the number of columns of matrix C is small, the main 
cost is related to the calculation of AC. One must bear in mind that block 
operations (on "multivectors" ) are comparatively much faster than single vector 
operations, especially when the matrices are sparse (because data fetching is 
factorized). In a domain decomposition context, product AC corresponds to 
the resolution of Dirichlet or Neumann problems in substructures, which makes 
the simultaneous treatment of many columns very efficient (and minimizes the 
number of exchanges). One must also remember that a conjugate gradient 
iteration involves a preconditioning step which may be expensive. (The cost is 
comparable to that of an operator product in optimal domain decomposition 
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methods.) Thus, the additional cost of augmentation relative to the cost of one 
iteration depends on many parameters (the size of the problem, the number of 
augmentation vectors, the number of subdomains, the preconditioner chosen...). 
Typically, in the examples presented in this paper, we found that, using an 
optimal preconditioner, the CPU cost of between 4 and 7 augmentation vectors 
(depending on the hardware configuration) cost no more than one CG iteration. 

A question which is not addressed in this paper is the verification of the 
full-rank property of matrix C, which affects the quality of the factorization 
of matrix (C T AC). Strategies to correct a dependence among the columns of 
matrix C due to inexact arithmetic can be found in pQ. 



3 Total reuse of Krylov subspaces 

In this section, we show how it is possible to define efficient augmentation strate- 
gies in a multiresolution context. Let us consider the sequence of linear systems: 

A(*V*>=&W , fc = l,...p (11) 

where is an n x n symmetric positive definite matrix and is the right- 
hand side. Each linear system is solved using an augmented preconditioned 
conjugate gradient algorithm APCG(A (fc) , M^ k \C^ k \ Let be the 

number of iterations which is necessary to reach convergence, and let 



(k) (fe) 



(12) 



be a basis of the associated Krylov subspace. 

As explained in the previous section, augmentation never increases the con- 
dition number which governs the asymptotic convergence rate. More precisely, 
the presence of active eigenvectors of the current preconditioned problem in 
may increase the efficiency of the iterative solver significantly. Classical strate- 
gies can be used in the case of invariant preconditioned operators (A^ — A, 
AfW = M) and multiple right-hand sides. 

It is more difficult to define efficient strategies in the general case of varying 
operators with no information available on their evolution. A simple and natural 
idea is to reuse previous Krylov subspaces. A first algorithm which reuses all 
the previous Krylov subspaces is Total Reuse of Krylov Subspaces (TRKS) 
(Algorithm [2|, which needs only a few comments: 



Since (according to ^) C (/c)T A^wffl = 0, the vectors of the concate- 
nated matrix C^ +1 ^ are linearly independent. Therefore, APCG(A^ fe+1 \ 
M (/c+i) 5 C(fc+i) 5 does not break down and converges. 

• The previous Krylov subspaces are fully reused through concatenation 
without post-processing; the only downside is that the memory require- 
ments increase due to the need to save the Krylov subspaces. 
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• If the number of columns of matrix becomes too large, the method 
may become computationally inefficient, even though the number of iter- 
ations decreases considerably. Nevertheless, TRKS probably leads to the 
best reduction in the number of iterations achievable by reusing Krylov 
subspaces. Therefore, it can be used as a reference in terms of the reduc- 
tion of the number of iterations for any other algorithm based on a reuse 
of Krylov subspaces. 

• One possible way to reduce the cost of TRKS without reducing the size 
of C consists in using approximate solvers, as in the IRKS strategy [41]. 



Algorithm 2: TRKS-APCG 



Initialize = Co (an n x mo full-rank matrix); 
for k = 0, . . . ,p — 1 do 
Solve AWxW = 

with APCG(AW, M^ k \C^ k \ &W); 



Define W, 



(k) 



...,w 



(fe) 



Concatenate: C^ k+1 ^ 



end 



In order to reduce the cost associated with the total reuse of Krylov sub- 
spaces, we propose to work on extracted sub-subspaces, an operation often re- 
ferred to as the recycling of Krylov subspaces. The objective is to retain the 
smallest number of independent vectors which achieve the greatest decrease in 
the number of iterations. Clearly, the most effective approach would be to 
calculate approximate eigenvectors from the previous Krylov subspaces for the 
current operator. However, because of the variability of the operators, the ex- 
traction of such information would be extremely time consuming and would 
affect the global efficiency. Conversely, approximate eigenvectors of previous 
problems can be calculated from the associated Krylov subspaces at nearly no 
cost. In the following section, we describe an efficient algorithm for the extrac- 
tion of such approximation vectors along with a simple selection procedure to 
recycle only a few of these vectors. Of course, the performance of our method 
depends on the stability of the eigenspaces from one system to another. This 
topic, especially concerning the lower part of the spectrum, is discussed in [24] . 



4 Selective recycling of Krylov subspaces 

The standard convergence of conjugate gradients corresponds to an asymptotic 
convergence rate. Using this property to predict the number of iterations n e 
which is required to reach an accuracy level e cg leads to a huge overestimation. 
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Indeed, one has: 




\K-x\\ k ^ y ^' ^ - ' ln(a lin ) 



with a 



This result alone cannot explain the improvement in the convergence rate 
observed during the iteration process. This superconvergence phenomenon can 
be explained by a study of the convergence of Ritz values [33 which enables 
one to define an instantaneous convergence rate [49]. This explanation can be 
improved by a study of the influence of the distribution of the eigenvalues [35j [2] . 

The objective of recycling Krylov subspaces is to find the best augmentation 
space in order to trigger superconvergence quickly. This section is organized as 
follows: we start with a review of Ritz eigenelement analysis and continue with 
a brief presentation of the improved convergence results; these results lead to a 
number of selection strategies, which will be assessed in Section [5] 

4.1 Ritz analysis: theory and practical calculation 

For ^ i < m, Ritz vectors (y^) and values (0 m ) are approximations of the 
eigenvectors and eigenvalues of the symmetric positive definite matrix A; their 
definition is similar to that of the iterates in the conjugate gradient algorithm 



f find (t^GU^fioJxK (M) 

\ such that Ay m - 6 l m y l m _L JC m (A,v ) 

The symmetric Lanczos algorithm [45 enables one to build a particular or- 
thonormal basis of /C m (A, £>o), denoted V m . Then, the search principle becomes: 

y i m = V m qL V^AV mq i n = e i m qt n (15) 

The Lanczos basis Vm makes the Hessenberg matrix H m = V^AV m symmet- 
rical and tridiagonal. V m and H m can be recovered directly from the conjugate 
gradient coefficients [44] : 




(•••' ( 1)J |^ll'-")o^<m 

tridiag(^_i,(5 j ,^)o^ < rn 



(16) 



•fl A r 1 Pj-1 

with do = — , dj = h — — , rjj 



Since matrix H m is symmetrical and tridiagonal, its eigenelements (0™, g™)i^j^ m 
can be calculated easily, for example using a Lapack procedure. Let us de- 
fine m diag(<9^ ^ ... < 6™) and Q m such that H m 
Qm®mQm T - ®m an d Y m = V m Qm are the Ritz values and associated Ritz 
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vectors, which are approximations of the eigenelements of operator A and sat- 
isfy: 

y T iy — (p\ o nr ] V T Y — T 



We presented Ritz analysis for the equivalent symmetric system described 
previously because symmetry simplifies the calculation of eigenelements, but 
the analysis can be transferred back to the left-preconditioned system using the 
following transformation rules: 



V m = L 1 V m 



T \ -fy (17) 



(^^i) 1/2 "" 

H m = ff m = AV rn 

Y rn = 7v 1^ = VrnQrn 

The Ritz vectors are the solution of a generalized eigenproblem and satisfy the 
following orthogonality properties: 

Y£AY m = G m and Y^MY m = I m (18) 

One can show that when m increases the Ritz values converge toward the 
eigenvalues of A, and that the convergence is either from above or from below 
depending on their rank [49j [51] : 

#m ^ #m-l ^ #m ^ • • • ^ #m _1 ^ #m-l ^ #m ( 19 ) 

In addition, in the case of clearly distinct eigenvalues, the convergence of a Ritz 
value results in the convergence of the associated Ritz vector. 



4.2 Relation between the convergence of conjugate gradi- 
ents and the convergence of the Ritz values 

In [49], the superconvergence phenomenon is explained by the convergence of 
the Ritz values through the definition, at each iteration, of a instantaneous 
convergence rate associated with the part of the spectrum that is not yet ap- 
proximated correctly by the Ritz values: at a given conjugate gradient iteration, 
one can find a deflated system (with some of its extreme eigenvalues removed) 
with similar behavior. Let [X n .., AJ be the spectrum of the deflated operator. 
The equivalent convergence rate is: 

II*-s 4+ iIIa< F tAr 2 a t Jx - Xi \l (20) 
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where F. l r quantifies the convergence of the / smallest and r largest Ritz values 
to the extreme eigenvalues: 



F , max jW max 

*' ,r V>1 l > 1 r'^r r > r 
I 



A. 

J 

A 



Since this result holds for every pair (Z,r), the effective convergence rate at 
Iteration z corresponds to the pair (7, r) which minimizes a . z r = F. l r cr l r . 

Then, after some iterations, the superconvergent conjugate gradient algo- 
rithm behaves very much like a conjugate gradient algorithm augmented by 
the extreme eigenvectors which are associated with the converged Ritz values. 
In a multiresolution context, provided the linear systems have similar spectral 
properties, the Ritz vectors associated with the converged Ritz values obtained 
for one system should define a viable augmentation space for the subsequent 
resolutions. 



4.3 Effect of the distribution of the eigenvalues 

The effect of the distribution of the eigenvalues on the convergence of conjugate 
gradients was studied in [35j [2]. The results take into account the fact that 
preconditioning often leads to clustered eigenvalues as opposed to uniformly 
distributed eigenvalues, as can be seen in Figure [I] 

In addition to other results, the authors showed that if a spectrum consists of 
p isolated eigenvalues in the high part of the spectrum, p isolated eigenvalues in 
the low part of the spectrum and n—2p uniformly distributed central eigenvalues, 
then the conjugate gradient convergence takes the form: 



, In (e 12) YJLi In(%^(l-^- 
>n e =2p + int - V C9 ' } \ * V n ~ p+l// 21 

The convergence rate is approximately equal to the classical convergence rate 
for the central part, plus one iteration per higher eigenvalue and a little more 
than one iteration per lower eigenvalue. These results can be combined with the 
work by Jiao j5TJ [22] on the convergence of Ritz values. In general, since the 
method is related to the power iteration method, a correct approximation by 
the Ritz values is obtained first for the highest eigenvalues, then for the lowest 
part of the spectrum, resulting in superconvergence (which is governed by the 
asymptotic convergence rate of the reduced spectrum). 
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4.4 Selection procedures 



The results of Section |4.2| lead to a first proposal of a selection procedure for 
converged Ritz vectors: convergence is identified by the stagnation of the Ritz 
values; if the conjugate gradient algorithm converges at iteration m, the Ritz 
values are calculated for the previous two states m and m _i. Once ranked, 
the m most recent Ritz values m are compared to the m — 1 previous values 
according to the following criteria: 



\0 j - j | 

J m has converged if — ^ — . 771-1 < e, l^j^m— 1 

\0m\ 

\gm-j _ Qm-l-ji 

O™- 3 has converged if — — — - ^ e, 0^j^m-2 



(22) 



where e is a user parameter which is easy to adjust since the criterion is generally 
either very high (before the convergence of the Ritz value) or very small (after 
convergence). Figure [l] illustrates that property with the simple example of 
the operator associated with the decomposition of a linear elastic cube into ten 
subdomains; in that case, the higher half of the spectrum has converged. 



DC 6- 



■ Spectrum (histogram) 
► Convergence criterion 



2.0 2.5 3.0 

Log of Ritz value 



Figure 1: Ritz spectrum and convergence of the Ritz values 

The principle of the selective recycling of Krylov subspaces (SRKS-APCG) 
is described in Algorithm [3j Basically, in addition to the memory required by 
APCG, the SRKS-APCG algorithm requires storage for m n- vectors {zj)j = \ j7n . 
One should note that the selected vectors are normalized by the square root 
of the associated Ritz value in order to improve the condition number of the 
coarse matrix. (If operator A remained constant, matrix (C T AC) would be 
the identity matrix.) 

For better computational efficiency, a restart parameter can be introduced 
in order to limit the size of the augmentation space associated with parameter 
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Algorithm 3: SRKS-APCG 



Initialize = Co (full column rank matrix); 
for k = 0, . . . ,p — 1 do 

• Solve A^xW = with APCG(A^ k \M^ k \C^\b^); 



• Define V rn = 



...,i-iy 



(^) 1/2 '" 

• Define iJ m = tridiag(r/ J _i, ^, 77j)o<j <m 

• Compute eigenelements (Q m , m ) of H m (9^ ^ ... ^ #™); 

• Compute F m = V m Q m = [y^, . . . , y™\ ; 

• Extract if m _i = tridiag(7fr_i, ^, 77j)o<i<m-i 5 

• Compute eigenvalues of ff m _i ; 



for j = 1 , . . . , m — 1 do 



C 



end 



• Concatenate C^ +1 ) = [C^jC] , C = [0]; 

• If dim(C^ +1 )) ^ n Clim , then = 



end 



n Clim in Algorithm [3] This limit size can be set after a complexity analysis 
under the assumption that all non-augmented systems would be solved in the 
same number of iterations. However, we did not use such a restart procedure in 
our experiments. 

In order to be even more selective, we propose a reselection strategy based 
on a prediction of the efficiency of the retained vectors. Indeed, the results of 
Section [473] in terms of the effect of the distribution of the eigenvalues lead us 
to retain only the converged Ritz vectors which belong to the external part of 
the spectrum: 

• this is known to be the first part of the spectrum whose approximation by 
Ritz values is good; 

• since the convergence of Ritz vectors is identified by the stagnation of the 
associated Ritz values, the fact that the external Ritz values are distinct 
ensures that the Ritz vectors approximate the eigenvectors correctly [ST]; 

• while choosing vectors in the dense central zone does not modify the shape 
of the spectrum and does not improve convergence, selecting the external 
part of the spectrum triggers superconvergence instantly. 

In order to select only the external part of the spectrum, we implemented 
the cluster identification algorithm proposed in [32 . This algorithm seeks the 
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piecewise constant distribution which is nearest (in a least squares sense) to the 
distribution of the distances among the sorted eigenvalues. The only parameter 
required is the minimum size of the cluster, which we set at one-fifth the number 
of preselected vectors. As will be shown in the next section, the performance 
achieved with this reselection algorithm is not outstanding, but some results in 
terms of gain per augmentation vector are worth considering. 

5 Numerical assessments 

We present three numerical experiments. Two concern the evaluation of a struc- 
ture made of random materials, as is the case in a Monte-Carlo simulation. In 
the first case, the materials are elastic; in the second case, which is a nonlinear 
problem, they are elastic-plastic. The last case is a large displacement problem, 
which raises specific difficulties. 

The methods were implemented in the ZEBULON code [34] and paral- 
lelism was introduced using MPI. The calculations were performed on the LMT- 
Cachan cluster, which consists of dual quadcore and dual hexacore processors 
connected by a gigabit network. The calculations were always carried out on 
homogeneous sets of processors which were entirely dedicated to one task which 
fit entirely in memory, so swapping was not necessary. In each case, we indicate 
the CPU time which measures the amount of work performed for one subdo- 
main. The Wall Clock Time (WCT), a global measure which is more sensitive 
to external perturbations induced by the operating system and the presence of 
other users, was considered to be unreliable in many cases; so we mention it 
only for the first set of experiments. One should note that the gains calculated 
with WCT were always greater. 

The CPU plots show the total time as well as the time dedicated to aug- 
mentation (preparation of the coarse operator, initialization and projections); 
the difference represents the iterations of the solver. 

All the calculations used a dual formulation of the interface problem through 
domain decomposition (FETI). The convergence was evaluated using the norm 
of the residual (which corresponds to the displacement gaps at the interfaces) 
normalized by the condensed right-hand side. Classically for such structural 
problems, total reorthogonalization was used to enforce the A-conjugation of 
the search directions. (The case without reorthogonalization is discussed briefly 
in the first example.) 

5.1 The case of a sequence of linear systems 

We considered a cube (of side 50 mm) with 4 x 4 x 4 = 64 small cubic inclusions 
(of side 5.5 mm). A slice through this structure is shown in Figure [5] The 
cube was clamped over one side, and the opposite side plus another side were 
subjected to uniform pressure. The mesh consisted of 125, 000 linear hexahedral 
elements for a total of 400, 000 degrees of freedom. Three automatic decomposi- 
tions (into 12, 48 and 96 subdomains) were performed using the Metis algorithm 
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[23] (see Figure [3|. The resulting interface system contained 54,000 unknowns 
for the 12-sub domain decomposition, 96, 000 unknowns for the 48-subdomain 
decomposition and 133, 000 unknowns for the 96-subdomain decomposition. All 
the materials were isotropic, linear and elastic, and were characterized by their 
Young's modulus and Poisson's coefficient. The material properties of each in- 
clusion and of the matrix were chosen randomly following a normal law with 
a relative standard deviation equal to 10%, leading to a ±23% variation range 
about the nominal value. The average Young's modulus was 200 MPa for the 
matrix and 20, 000 MPa for each inclusion, and the average Poisson's coeffi- 
cient was 0.27 for the matrix and 0.35 for each inclusion. The objective was to 
perform the calculations for 40 draws of the 130 coefficients. 

^^^^^ 

Figure 3: The decomposition into 48 
subdomains 

We used a dual formulation (FETI) with both a Dirichlet (optimal) and a 
lumped preconditioner, leading to 10 -3 and 10 -6 APCG accuracy respectively. 
We considered the following algorithms: conjugate gradients (eg), total reuse of 
Krylov subspaces (trks) and selective reuse of Krylov subspaces with two values 
of the criterion, e = 10 -6 (srks6) and e = 10 -14 (srksl4). In addition, in the 
last case (e = 10 -14 ), we also attempted to further refine the selection by not 
selecting the converged Ritz values contained in the central cluster (identified 
by the algorithm proposed by [32 j); this method is labeled (clustl4). 

5.1.1 Comparison of the strategies 

The results for the 12-subdomain decomposition are summarized in Table [I] 
which gives the average number of APCG iterations to convergence, the average 
size of the augmentation space, the final size of the augmentation space, the 
average CPU and wall clock times per system, from which we also deduced the 
augmentation time (operator preparation and projection). Computations were 
conducted one dual hexacore processor (one subdomain per core). When the 
average and final sizes of the augmentation space are close, this means that 
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Table 1: Performance summary for the cube with inclusions 



most of the augmentation space was identified with the first systems. For a 
given configuration (accuracy and preconditioner) , the figures in bold in the 
three columns 'average number of iterations', 'average CPU time' and 'average 
wall clock time' indicate the best strategy in terms of gain per unit augmentation 
vector compared to CG. 

For the 12-subdomain decomposition, Figures Q [6j [8| (for an objective of 
10 -3 accuracy) and Figures ([5j[7|[£|) (for an objective of 10 -6 accuracy) give 
the evolutions of the number of APCG iterations to convergence for each linear 
system, the dimension of the augmentation space n c and the the CPU time for 
the resolution of each system, with both lumped and Dirichlet preconditioners. 

Without full reorthogonalization, the performance was very poor and led to 
about twice the number of iterations of the recommended fully reorthogonalized 
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Figure 4: Cube 12 subdomains, lumped, Figure 5: Cube 12 subdomains, Dirich- 
10~ 3 accuracy, number of iterations per let, 10~ 6 accuracy, number of iterations 
linear system per linear system 




System number 



System number 



Figure 6: Cube 12 subdomains, lumped, Figure 7: Cube 12 subdomains, Dirich- 
10~ 3 accuracy, dimension of the aug- let, 10 ~ 6 accuracy, dimension of the 
mentation space augmentation space 



conjugate gradients. This was expected because systems resulting from domain 
decomposition formulations are known to often require full reorthogonalization 
[T3] . Furthermore, one should note that the additional iterations carried out in 
the non-reorthogonalized case led to vector sets which made the Ritz analysis 
more complex due to the appearance of nonphysical, multiple eigenvalues. The 
non-reorthogonalized approach was no longer considered in the other examples. 

With the TRKS approach, two types of behavior were observed. In the low- 
accuracy case (10 -3 ), for both preconditioners (but especially for the lumped 
preconditioner), the size of the augmentation space reached a plateau, which 
means that the augmentation space contained almost all the required informa- 
tion; the gains in terms of both the number of iterations (> 90%) and the CPU 
time (> 55%) were excellent. In the high-accuracy case (10 -6 ), the size of the 
augmentation space never stabilized; therefore, even though the number of it- 
erations decreased drastically, the CPU time increased. Table [2] gives extended 
performance results for TRKS which confirm this analysis. The gains are given 
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Figure 8: Cube 12 subdomains, lumped, Figure 9: Cube 12 subdomains, Dirich- 
10" 3 'accuracy, CPU time per linear sys- let, 10~ 6 accuracy, CPU time per linear 
tern system 



relative to conjugate gradients. The efficiency of augmentation is defined by 
the average decrease in the number of iterations per augmentation vector; the 
higher the required accuracy, the less efficient the TRKS approach. These re- 
sults justify our decision to select the subspaces so that the dimension of the 
augmentation space would remain under control. 

The SRKS14 approach succeeded in limiting the size of the augmentation 
space and led to a satisfactory decrease in the number of iterations. As can be 
seen on the figures, SRKS6 did not stabilize the augmentation space as efficiently 
and behaved half way between TRKS and SRKS14; therefore, we will choose 
SRKS14 as our reference algorithm from now on. 

The cluster strategy as it stands today gave unsatisfactory results: even 
though it often led to the best gain per augmentation vector, it seemed to impair 
the selection of useful vectors and allow much less reduction in the number of 
iterations than SRKS. After the resolution of many systems, it tended to lead 
to the same augmentation space as SRKS. 

To confirm that hypothesis, we compared the spaces Csrks and C c i us t er 
after the 40 resolutions for the low- accuracy Dirichlet case. We used the follow- 
ing procedure: first, the vectors were orthonormalized using SVD: C = UTy T ] 
then SVD was applied to the concatenated matrix [Usrks, U c i uste r}' A plot of 
the singular values is shown in Figure [Toj Independent spaces would lead to a 
constant value equal to 1, while for nested spaces the common space would lead 
to {V% 0} pairs of singular values. One can observe that the spaces are not 
exactly nested, but come quite close. 

In conclusion, the cluster strategy is not mature yet, but it is promising. 
It was not considered for the following experiments because, due to the larger 
number of systems involved, it would behave quite similarly to SRKS. 

5.1.2 Study of SRKS 14 in various configurations 
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Table 2: Relative performance of TRKS 



Table [3] shows the relative performance of SRKS14 as a function of the num- 
ber of subdomains, of the preconditioners and of the accuracy. The efficiency of 
the augmentation is defined as the decrease in the average number of iterations 
per augmentation vector. One can observe that the efficiency ranged between 
0.5 and 0.85 and was best for the lower accuracy and the improved precon- 
ditioner. For these spectra in which there exist no small isolated eigenvalues 
(which could lead to efficiencies greater than 1), such results are consistent with 



the theory (see Section 4.3). In the next section, we will see that this moderate 
efficiency does not preclude significant CPU improvements. 

The gains in terms of the number of iterations were relatively stable, typically 
between 50% and 60% in the high- accuracy case. 

5.1.3 Influence of the hardware configuration on the CPU gains 



Now, let us study the performance of SRKS14 in terms of CPU time for the 
same decomposition into 48 subdomains, but using different hardware configu- 
rations: 

1. Configuration A corresponds to 4 dual hexacore nodes with 1 subdomain 
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Table 3: Iteration gains for SRKS14 



per core; 

2. Configuration B corresponds to 6 dual quadcore nodes with 1 subdomain 
per core; 

3. Configuration C corresponds to 3 dual quadcore nodes with 2 subdomains 
per core; 

4. Configuration D corresponds to 2 dual quadcore nodes with 3 subdomains 
per core. 

One can note that the processors in Configuration A were different from those 
used in the other cases. In all the cases, the memory was sufficient to avoid 
swapping. The results are given in Table [4] for the Dirichlet preconditioner and 
in Table [5] for the lumped preconditioner. One can see that Configurations B,C 
and D had similar performances and were slower than Configuration A due to 
the different memory technology. 

One interesting factor is the ratio of the average CPU cost of an iteration to 
the average CPU cost of an augmentation vector (the last columns of Table [4] 
and [5]). One can see that in Configuration A, 4 augmentation vectors cost no 
more than one iteration; in the other configurations 7 augmentation vectors 
cost no more than one iteration. Since we saw that one needs about 1/0.6 ^1.6 
augmentation vectors to save one iteration, the advantage of augmentation is 
clear. Indeed, we observe a 32% CPU improvement in Configuration A and a 
40% to 50% improvement in the other configurations. 

Note that when the lumped preconditioner is used the equivalent cost of 
an iteration is only 2.8 augmentation vectors in Configuration A and 4.5 aug- 
mentation vectors in Configuration D (see Table [5J. Since the efficiency of the 
augmentation vectors is less when this inexpensive preconditioner is used (in 
the high accuracy case), so is the CPU improvement. 
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Configuration 


CG avg. CPU 


CPU gain 


CPU per iteration / 
CPU per augm. vector 
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29.7 
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29.9 
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7.8 



Table 4: CPU performance of SRKS14 for 10" 6 accuracy with the Dirichlet 
preconditioner 



Configuration 


CG avg. CPU 


CPU gain 


CPU per iteration / 
CPU per augm. vector 


A 


10.7 


22.4% 


2.8 


D 


27.4 


33.3% 


4.5 



Table 5: CPU performance of SRKS14 for 10 6 accuracy with the lumped 
preconditioner 



The ratio of the CPU time per iteration to the CPU time per augmentation 
vector for SRKS (Column 4 of the previous tables) turned out to be relatively 
stable for a given machine with a given preconditioner. This is due to the 
stability of the size of the augmentation space which prevented the cost from 
soaring (as would happen with TRKS). Thus, the CPU performance can be 
deduced from the iteration gains and the augmentation efficiency (see Table [3]). 
For instance, the CPU gain for SRKS with the 96-subdomain decomposition 
was slightly greater than 50%. 



5.2 The case of a sequence of nonlinear problems 



Now let us consider a hexahedral holed plate (10x10x0.2 mm with a center hole 
of radius 1 mm, see Figure 11) subjected to unidirectional tension (a prescribed 
normal displacement). The plate was discretized into 61,000 linear hexahedral 
elements for a total of 41,000 degrees of freedom. The structure was divided 
into 8 subdomains using the Metis algorithm, which resulted in an interface 
system with 3, 000 unknowns. The problem was solved using one 8-core proces- 
sor (one subdomain per core). Elastic-plastic behavior with nonlinear isotropic 
hardening and a Von Mises'-type plasticity criterion was assumed. Denoting <j 
the Cauchy stress tensor, e(u) the symmetric gradient of the displacement field 
ix, and /C the Hooke tensor, the material law can be written as: 



if f{p) = then e 
if f (a) ^ then & 

f(a) = J I a: a - (R + Q (l - e~ bX )) 



a = JC : 

iP = A/, e 
& = 



(23) 
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The coefficients were assigned a normal law with a 10% relative standard de- 
viation, which implied variations of up to ±23% in the coefficients. The mean 
values of the material parameters were: E = 200, 000 MPa, v = 0.3, Ro = 300 
MPa, b = 22 and Q = 170 MPa. The loading was applied in two steps: first, 
a single increment to reach the elastic limit; then, 16 equal increments in order 
to multiply the prescribed displacement by 4. The objective of the study was 
to analyze 21 configurations. 

Again, the linear solver used was FETI with a Dirichlet or lumped precon- 
ditioner. The accuracy objective for the linear systems was set at 10 -6 . (The 
accuracy must be high for the nonlinear process to run well). Because of the 
approximations, not all the methods converged in the same number of Newton 
iterations; on average, one nonlinear analysis required the resolution of 95 tan- 
gent systems. Table [6] summarizes the performances of the various methods; 
Figure [l2| shows the evolution of the average number of APCG iterations with 
the lumped preconditioner during the sequence of linear systems; Figure 



shows the evolution of the size of the augmentation space; Figures 14 and 
show the evolutions of the average CPU time and wall clock time for the resolu- 
tion of one linear system along with the evolution of the average augmentation 
time (operator creation and projection). 




Figure 11: The holed 
plate example (plastic 
strain) 
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calculation too slow, was stopped before all the systems were 
solved 

Table 6: Holed plate, performance summary 
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Figure 12: Plate, lumped - ay 
linear system 
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Figure 14: Plate, lumped - avg. CPU Figure 15: Plate, lumped - avg. wall 
time / system clock time / system 

This test leads to conclusions similar to the previous ones. More specifi- 
cally, one can observe that the SRKS augmentation space selected after the first 
nonlinear configuration remained stable. Conversely, since the TRKS augmen- 
tation space never reached a plateau, the solutions of the linear systems did not 
belong to a common space. SRKS was the most efficient method, leading to a 
20% CPU gain and a 36% wall clock time improvement. 



5.3 The case of a large displacement problem 

Finally, let us consider the problem of the buckling of a straight heterogeneous 
beam with a circular cross section (length/diameter ratio equal to 30), clamped 
at one end and subjected to an axial pressure at the other, with no radial 
displacement. The heterogeneities consisted of five straight fibers whose stiffness 
was 1, 000 times that of the matrix. The problem was formulated in the updated 
Lagrangian framework, assuming linear elastic behavior (characterized by the 
Young's modulus and Poisson's coefficient) in the current configuration. The 
beam was discretized into 90, 000 linear hexahedral finite elements for a total of 
300, 000 degrees of freedom. It was divided into 10 subdomains using the Metis 
algorithm, leading to an interface system with 16, 000 unknowns. A single 12- 
core processor was used (1 subdomain per core, leaving 2 inactive cores). The 



pressure was applied incrementally up to the configuration shown in Figure 18 
in which the maximum axial displacement was about 3% of the total length. 
12 increments were used, leading to the resolution of about 30 tangent linear 
systems. 

We used a FETI solver with a Dirichlet preconditioner and an "identity" 
projector. The FETI convergence criterion was set to 10 -6 . Figure 16 shows 



the evolution of the number of conjugate gradient iterations required for the 



resolution of each linear system. Figure 17 shows the evolution of the size 
of the augmentation space. Three algorithms were tested: classical conjugate 
gradients, total reuse of subspaces, and selective reuse of subspaces (e = 10 -14 ). 
Table summarizes the main results. 

The following observations can be made: 
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5 10 15 20 25 30 5 10 15 20 25 30 



System number System number 

Figure 16: Buckling of the heteroge- Figure 17: Buckling of the heteroge- 
neous beam: number of iterations per neous beam: dimension of the augmen- 
linear system tation space for each linear system 




Figure 18: The beam in the refer- 
ence and deformed configurations, 
with a view of the cross section 

• The performance of TRKS was really impressive because the systems con- 
verged in 10 times fewer iterations than with CG, but the size of the 
associated augmentation space was large (up to 208 vectors) and never 
ceased to increase; 

• SRKS appeared to be efficient: the number of iterations was divided by 3 
with a space whose size increased slowly, then reached a plateau; 

• With the hardware configuration used, the best results in terms of com- 
putation time (a 48% CPU improvement) were achieved with TRKS, but 
the gain normalized by the number of augmentation vectors was better 
with SRKS (a 29% CPU improvement). SRKS was truly successful in 
controlling the dimension of the augmentation space. 

In this example, the augmentation proved to be very efficient in terms of the 
iteration gain per augmentation vector, especially for SRKS (0.85). This was 
probably because of a specificity of the spectrum of the preconditioned operator 
due to the use of domain decomposition in large displacements. Indeed, the 
tangent matrix of a floating subdomain with prescribed Neumann conditions 
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Table 7: Recycling performance for the 
buckling problem 



may become non-positive, contrary to the Dirichlet operator which remains 
positive definite. It is known that a slight lack of positivity of the operator 
does not prevent reorthogonalized conjugate gradients from converging [37], but 
convergence is slower than when all the eigenvalues are positive. The positivity 
of the preconditioner makes the selection procedure still possible. Moreover, 
the negative eigenvalues are systematically selected by the procedure, so using 
augmentation causes the solver to iterate in the subspace in which the operator 
is positive, leading to a much better convergence rate. 

6 Conclusion 

This paper dealt with the resolution of sequences of large linear systems with 
varying matrices and right-hand sides using conjugate gradients. We proposed 
several algorithms based on the augmentation of the current Krylov subspace by 
a selection of previously generated subspaces. The advantage of these methods 
is that some of the iterations are replaced by the preprocessing of a coarse 
problem associated with optimized operations. 

When low accuracy is sufficient, total reuse of the previous subspaces (the 
TRKS algorithm) appears to lead to satisfactory results. When high accuracy 
is required, the subspaces are too unstable, which causes the dimension of the 
TRKS augmentation space to soar. Therefore, we proposed to retain only the 
part of the subspace generated by the Ritz vectors associated with converged 
Ritz values of the preconditioned operator (the SRKS algorithm). These vectors 
can be built very inexpensively. Such an augmentation was found to remain 
stable throughout the linear systems and to lead to a reduction in the number 
of iterations which is consistent with the theory. In terms of computation time, 
the proposed method leads to a variable, but always positive, gain compared to 
non- augmented systems. We observed CPU time improvements of 20% to 50%, 
and wall clock time improvements of 40% to 70%. 

Up until now, our attempts to improve the selection algorithm by eliminat- 
ing the converged values in the central part of the spectrum have not led to 
impressive results. This probably means that the Ritz vectors associated with 
converged Ritz values contain meaningful information which cannot be removed 
from the analysis of the current system. A continuation of this work could con- 
sist in a better analysis of the accumulation of the augmentation vectors. This 
can be done by studying the coarse matrix C T AC whose distance to the identity 
matrix characterizes the variation of the Krylov subspaces. Another objective 
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would be to port some of the ideas presented in this paper to nonsymmetric 
solvers. 
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